The Geometry of Kernelized Spectral Clustering by Geoffrey Schiebinger1,
نویسندگان
چکیده
Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same assumption, embedded samples from different components are approximately orthogonal with high probability when the sample size is large. As a corollary we control the fraction of samples mislabeled by spectral clustering under finite mixtures with nonparametric components.
منابع مشابه
The Geometry of Kernelized Spectral Clustering
Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture...
متن کاملAccuracy Evaluation of The Depth of Six Kinds of Sperm Counting Chambers for both Manual and Computer-Aided Semen Analyses
Background Although the depth of the counting chamber is an important factor influencing sperm counting, no research has yet been reported on the measurement and comparison of the depth of the chamber. We measured the exact depths of six kinds of sperm counting chambers and evaluated their accuracy. MaterialsAndMethods In this prospective study, the depths of six kinds of sperm counting chamber...
متن کاملKernel methods in computer vision: object localization, clustering, and taxonomy discovery
In this thesis we address three fundamental problems in computer vision using kernel methods. We first address the problem of object localization, which we frame as the problem of predicting a bounding box around an object of interest. We develop a framework in Chapter II for applying a branch and bound optimization strategy to efficiently and optimally detect a bounding box that maximizes obje...
متن کاملTowards Finding a New Kernelized Fuzzy C-means Clustering Algorithm
Kernelized Fuzzy C-Means clustering technique is an attempt to improve the performance of the conventional Fuzzy C-Means clustering technique. Recently this technique where a kernel-induced distance function is used as a similarity measure instead of a Euclidean distance which is used in the conventional Fuzzy C-Means clustering technique, has earned popularity among research community. Like th...
متن کاملReview and Comparison of Kernel Based Fuzzy Image Segmentation Techniques
This paper presents a detailed study and comparison of some Kernelized Fuzzy C-means Clustering based image segmentation algorithms Four algorithms have been used Fuzzy Clustering, Fuzzy CMeans(FCM) algorithm, Kernel Fuzzy CMeans(KFCM), Intuitionistic Kernelized Fuzzy CMeans(KIFCM), Kernelized Type-II Fuzzy CMeans(KT2FCM).The four algorithms are studied and analyzed both quantitatively and qual...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015